CoGrOO: a Brazilian-Portuguese Grammar Checker based on the CETENFOLHA Corpus

نویسندگان

  • Jorge Kinoshita
  • Laís do Nascimento Salvador
  • Carlos Eduardo Dantas de Menezes
چکیده

This paper describes an ongoing Portuguese Language grammar checker project, called CoGrOO1-Corretor Gramatical para OpenOffice (Grammar Checker for OpenOffice), based on CETENFOLHA, a Brazilian Portuguese morphosyntactic annotated Corpus. Two of its features are highlighted: hybrid architecture, mixing rules and statistics; free software project. This project aims at checking grammatical errors such as nominal and verbal agreement, “crase” (the coalescence of preposition “a” (to) + definitive singular determiner “a” yielding “à”), nominal and verbal government and other common errors in Brazilian Portuguese Language. We also present some empirical results based on the implemented techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving CoGrOO: the Brazilian Portuguese Grammar Checker

This paper highlights the main results obtained in an effort to improve the grammar checker CoGrOO, a hybrid system which initially annotates the text using statistical Natural Language Processing (NLP) techniques, and then apply a rule-based analysis to identify possible grammar errors. The goal was to reduce omissions and false alarms while improving true positives without adding new error ru...

متن کامل

Baseline Acoustic Models for Brazilian Portuguese Using CMU Sphinx Tools

Advances in speech processing research rely on the availability of public resources such as corpora, statistical models and baseline systems. In contrast to languages such as English, there are few specific resources for Brazilian Portuguese. This work describes efforts aiming to decrease such gap. Baseline acoustic models for Brazilian Portuguese were built using the CMU Sphinx toolkit and pub...

متن کامل

Phonologic Patterns of Brazilian Portuguese: a grapheme to phoneme converter based study

This paper presents Brazilian Portuguese phoneme patterns of distribution, according to an automatic grammar rulesbased grapheme to phoneme converter. The software Nhenhém (Vasilévski, 2008) was used for treating data: written texts which were decoded into phonologic symbols, forming a corpus, and subjected to a statistical analysis. Results support the high level of predictability of Brazilian...

متن کامل

Segmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval

The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate any lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which...

متن کامل

Towards a Phonetic Brazilian Portuguese Spell Checker

Spell checking is no longer considered a big challenge for natural language processing, at least regarding the task of correcting documents during edition. Nevertheless, without human interaction, it is necessary to automatically choose the word that will more likely correct the misspelled word. Also, there is a further difficulty for spell checking: new types of errors on the web material have...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006